The AI Model Handbook: A guide to the world of artificial intelligence modeling (The Artificial Intelligence Handbook Series 2) by Trinh Minh

The AI Model Handbook: A guide to the world of artificial intelligence modeling (The Artificial Intelligence Handbook Series 2) by Trinh Minh

Author:Trinh, Minh [Trinh, Minh]
Language: eng
Format: epub
Publisher: Rodeo Press
Published: 2021-12-26T00:00:00+00:00


6.3 Classical NLP Modelling

Symbolic NLP

It is possible to teach a computer vocabulary, syntax, and grammar to solve language tasks. This approach is symbolic NLP and uses parsing techniques to identify the words, their roles, and their meanings (Part-of-Speech or POS tagging). Because of the complexity and ambiguity of language and its relative free form, it is difficult to make a hand-written inventory of all the rules required to understand and generate some language.

Another approach is to learn language probabilistically, using a statistical language model that is trained on real-world data. Because of the considerable amount of digital text available with corpora of millions, billions, and even trillions of words, and the large availability of computing power, the statistical approach has gained the upper hand while the symbolic paradigm has not made meaningful progress in real-world applications. MIT Professor Noam Chomsky has been very critical of the statistical approach despite its success. He was quoted as saying:

“It's true there's been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures. There is a notion of success, which I think is novel in the history of science. It interprets success as approximating unanalyzed data.” (“Pinker/Chomsky Q&A from MIT150 Panel”)

Norvig (“On Chomsky and the Two Cultures of Statistical Learning”) has an interesting article addressing his criticism. In particular, he points out the empirical success of these models applied to search engines (Norvig works at Google), speech recognition, machine translation, and question answering.

Language Model

A language model describes the probability distribution of words. It is a statistical representation of language. It answers the question of what is the probability that a word appears after a sequence of words, or what is the probability that a sentence was said vs. another one. This is a compelling approach to develop language applications because it can leverage existing textual data and can tell, for instance, if a sentence is grammatically correct or logical because correct and logical sentences are more likely to occur in the data.

Bag of words

The simplest language model is the bag-of-words model, where only the frequency of each word matters, neither the ordering nor the presence of other words. It is a poor model to generate sentences, but it is helpful to measure sentiment or classify text. If some words tend to appear more frequently in a negative sentence, their presence can indicate that a sentence is likely to be negative, using the Bayes formula of conditional probabilities.

N-gram models

A more advanced approach than the bag-of-words is the N-gram model. In the N-gram model, the probability of each word is conditional (depends) on the previous N-1 words. A bigram model accounts only for the previous word; a 3-gram model will account for the previous two words, etc. Given these conditional probabilities, the probability of a full sentence can be calculated thanks to the law of iterated expectations. It will be expressed as a simple product of conditional probabilities or as the sum of logarithmic probabilities if logarithms are used.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Eco-friendly approach of bio-indigo synthesis and developing purification methods towards isolation of indigo from indirubin and bacterial fragments by Ramalingam Manivannan & Kaliyan Prabakaran & Young-A Son(204414)
Personalized inhaled bacteriophage therapy for treatment of multidrug-resistant Pseudomonas aeruginosa in cystic fibrosis by unknow(172929)
CONSORT 2025 statement: updated guideline for reporting randomized trials by unknow(81402)
Critical evaluation of the ProfiLER-02 study design and outcomes by Vivek Subbiah & Razelle Kurzrock(80992)
Cardiac gene therapy makes a comeback by Oliver J. Müller & Susanne Hille & Anca Kliesow Remes(80750)
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(74428)
Unveiling the design rules for tunable emission in graphene quantum dots: A high-throughput TDDFT and machine learning perspective by Şener Özönder & Mustafa Coşkun Özdemir & Caner Ünlü(50884)
A yeast-based oral therapeutic delivers immune checkpoint inhibitors to reduce intestinal tumor burden by unknow(40256)
Covalent hitchhikers guide proteins to the nucleus by Alexander F. Russell & Madeline F. Currie & Champak Chatterjee(40214)
Meet the Authors: Christopher R. Mansfield and Emily R. Derbyshire by Christopher R. Mansfield & Emily R. Derbyshire(40091)
Alkaline-earth metals promote propane dehydrogenation with carbon dioxide through geometric effects: Altering the reaction pathway by unknow(32728)
Induced iron vacancies boosting FeOOH loaded on sustainable Fenton-like collagen fiber membrane for efficient removal of emerging contaminants by unknow(32503)
Efficient electric-field-assisted photochemical conversion of methane to n-propanol exclusively over penetrated TiO2Ti hollow fibers by Guanghui Feng(32451)
Bi2SiO5 nanosheets as piezo-photocatalyst for efficient degradation of 2,4-Dichlorophenol by Hangyu Shi & Yifu Li & Lishan Zhang & Guoguan Liu & Qian Zhang & Xuan Ru & Shan Zhong(32382)
A novel NDIPTA organic heterojunction photocatalyst with built-in electric field for efficient hydrogen production by Jiahui Yang & Baojun Ma & Yongfa Zhu(32358)
Enhanced conversion of methane to liquid-phase oxygenates via hollow ferrite nanotube@horseradish peroxidase based photoenzymatic catalysis by Jun Duan & Shiying Fan & Xinyong Li & Shaomin Liu(32330)
Ordered macroporous superstructure of defective carbon adorned with tiny cobalt sulfide for selective electrocatalytic hydrogenation of cinnamaldehyde by Xiao-Shi Yuan & Sheng-Hua Zhou & San-Mei Wang & Wenbo Wei & Xiaofang Li & Xin-Tao Wu & Qi-Long Zhu(32256)
What's Done in Darkness by Kayla Perrin(27139)
Topological analysis of non-conjugated ethylene oxide cored dendrimers decorated with tetraphenylethylene: Insights from degree-based descriptors using the polynomial approach by A Theertha Nair & D Antony Xavier & Annmaria Baby & S Akhila(26517)
Investigation of mechanical and self-healing properties of hydroxyl-terminated polybutadiene functionalized with 2-ureido-4-pyrimidinone by Mohsen Kazazi & Mehran Hayaty & Ali Mousaviazar(26453)